Randomized Dual Coordinate Ascent with Arbitrary Sampling
نویسندگان
چکیده
We study the problem of minimizing the average of a large number of smooth convex functions penalized with a strongly convex regularizer. We propose and analyze a novel primal-dual method (Quartz) which at every iteration samples and updates a random subset of the dual variables, chosen according to an arbitrary distribution. In contrast to typical analysis, we directly bound the decrease of the primal-dual error (in expectation), without the need to first analyze the dual error. Depending on the choice of the sampling, we obtain efficient serial, parallel and distributed variants of the method. In the serial case, our bounds match the best known bounds for SDCA (both with uniform and importance sampling). With standard mini-batching, our bounds predict initial data-independent speedup as well as additional data-driven speedup which depends on spectral and sparsity properties of the data. We calculate theoretical speedup factors and find that they are excellent predictors of actual speedup in practice. Moreover, we illustrate that it is possible to design an efficient mini-batch importance sampling. The distributed variant of Quartz is the first distributed SDCA-like method with an analysis for non-separable data.
منابع مشابه
Quartz: Randomized Dual Coordinate Ascent with Arbitrary Sampling
We study the problem of minimizing the average of a large number of smooth convex functions penalized with a strongly convex regularizer. We propose and analyze a novel primal-dual method (Quartz) which at every iteration samples and updates a random subset of the dual variables, chosen according to an arbitrary distribution. In contrast to typical analysis, we directly bound the decrease of th...
متن کاملStochastic Dual Coordinate Ascent with Adaptive Probabilities
This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems. Our modification consists in allowing the method adaptively change the probability distribution over the dual variables throughout the iterative process. AdaSDCA achieves provably better complexity bound than SDCA with the best fixed pr...
متن کاملAdaptive Stochastic Dual Coordinate Ascent for Conditional Random Fields
This work investigates training Conditional Random Fields (CRF) by Stochastic Dual Coordinate Ascent (SDCA). SDCA enjoys a linear convergence rate and a strong empirical performance for independent classification problems. However, it has never been used to train CRF. Yet it benefits from an exact line search with a single marginalization oracle call, unlike previous approaches. In this paper, ...
متن کاملStochastic Optimization with Importance Sampling
Uniform sampling of training data has been commonly used in traditional stochastic optimization algorithms such as Proximal Stochastic Gradient Descent (prox-SGD) and Proximal Stochastic Dual Coordinate Ascent (prox-SDCA). Although uniform sampling can guarantee that the sampled stochastic quantity is an unbiased estimate of the corresponding true quantity, the resulting estimator may have a ra...
متن کاملTrading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1411.5873 شماره
صفحات -
تاریخ انتشار 2014